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(57) Abstract 

The present invention provides polynucleotide sequences of the genome of Streptococcus pneumoniae, polypeptide sequences encoded 
by the polynucleotide sequences, corresponding polynucleotides and polypeptides, vectors and hosts comprising the polynucleotides, and 
assays and other uses thereof. The present invention further provides polynucleotide and polypeptide sequence information stored on 
computer readable media, and computer-based systems and methods which facilitate its use. 
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presence of Streptococcus pneumoniae in a sample, hereinafter referred to as 
diagnostic fragments or DFs. 

Each of the ORFs in fragments of the Streptococcus pneumoniae genome 
disclosed in Tables 1-3, and the EMFs found 5* to the ORFs, can be used in 
5 numerous ways as polynucleotide reagents. For instance, the sequences can be 
used as diagnostic probes or amplification primers for detecting or determining the 
presence of a specific microbe in a sample, to selectively control gene expression in 
a host and in the production of polypeptides, such as polypeptides encoded by 
ORFs of the present invention, particular those polypeptides that have a 

10 pharmacological activity. 

The present invention further includes recombinant constructs comprising 
one or more fragments of the Streptococcus pneumoniae genome of the present 
invention. The recombinant constructs of the present invention comprise vectors, 
such as a plasmid or viral vector, into which a fragment of the Streptococcus 

1 5 pneumoniae has been inserted 

The present invention further provides host cells Containing any of the 
isolated fragments of the Streptococcus pneumoniae genome of the present 
invention. The host cells can be a higher eukaxyotic host cell, such as a mammalian 
cell, a lower eukaryotic cell, such as a yeast cell, or a procaryotic cell such as a 

20 bacterial cell. 

The present invention is further directed to isolated polypeptides and 
proteins encoded by ORFs of the present invention. A variety of methods, well 
known to those of skill in the art, routinely may be utilized to obtain any of the 
polypeptides andproieins of the present invention. For instance, polypeptides and 
proteins of the present invention having relatively short, simple amino acid 
sequences readily can be synthesized using commercially available automated 
peptide synthesizers. Polypeptides and proteins of the present invention also may 
be purified from bacterial cells which naturally produce the protein. Yet another 
alternative is to purify polypeptide and proteins of the present invention from cells 
30 which have been altered to express them. 

The invention further provides methods of obtaining homologs of the 
fragments of the Streptococcus pneumoniae genome of the present invention and 
homologs of the proteins encoded by the ORFs of the present invention. 
Specifically, by using the nucleotide and amino acid sequences disclosed herein as 
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a probe or as primers, and techniques such as-ECR cloning and colony/plaque 

hybridization, one skilled in the art can obtain homologs. 

The invention further provides antibodies which selectively bind 

polypeptides and proteins of the present invention. Such antibodies include both 
s monoclonal and polyclonal antibodies. 

The invention further provides hybridomas which produce the above - 

described antibodies. A hybridoma is an immortalized cell line which is capable of 

secreting a specific monoclonal antibody. 

The present invention further provides methods of identifying test samples 
10 derived from cells which express one of the ORFs of the present invention, or a 

homoiog thereof. Such methods comprise incubating a test sample with one or 

more of the antibodies of the present invention, or one or more of the DFs of the 

present invention, under conditions which allow a skilled artisan to determine if the 

sample contains the ORF or product produced therefrom. 
15 tn another embodiment of the present invention, kits arc provided which 

contain the necessary reagents to carry out the above-described assays. 

Specifically* the invention provides a compartmentalized kit to receive, in 

close confinement, one or more containers which comprises: (a) a first container 

comprising one of the antibodies, or one of the DFs of the present invention; and 
20 (b) one or more other containers comprising one or more of the following: wash 




Using the isolated proteins of the present invention, the present invention 
further provides methods of obtaining and identifying agents capable of binding to 

25 a polypeptide or protein encoded by one of the ORFs of the present invention. 
Specifically, such agents include, as further described below, antibodies, peptides, 
carbohydrates, pharmaceutical agents and the like. Such methods comprise steps 
of: (a) contacting an agent with an isolated protein encoded by one of the ORFs of 
the present invention; and {b) determining whether the agent binds to said protein. 

30 The present genomic sequences of Streptococcus pneumoniae will be of 

great value to all laboratories working with this organism and for a variety of 
commercial purposes. Many fragments of the Streptococcus pneumoniae genome 
will be immediately identified by similarity searches against GenBank or protein 
databases and will be of immediate value to Streptococcus pneumoniae researchers 



2 : ' . < 



THISPA61 1 BLAM K ft ,SPT0) 



WO 98H8931 



7 



PCT/US97/19588 



and for immediate commercial value for the production of proteins or to control 
gene expression. 

The methodology and technology for elucidating extensive genomic 
sequences of bacterial and other genomes has and will greatly enhance the ability to 
5 analyze and understand chromosomal organization. In particular, sequenced 
contigs and genomes will provide the models for developing tools for the analysis 
of chromosome structure and function, including the ability to identify genes within 
large segments of genomic DNA, the structure, position, and spacing of regulatory 
elements, the identification of genes with potential industrial applications, and the 
10 ability to do comparative genomic and molecular phylogeny. 

DESCRIPTI ON OF THE FIGURES 

FIGURE 1 is a block diagram of a computer system (102) that can be 
15 used to implement computer-based systems of present invention, 

FIGURE 2 is a schematic diagram depicting the data flow and computer 
programs used to collect, assemble, edit and annotate the contigs of the 
Streptococcus pneumoniae genome of the present invention. Both Macintosh and 

20 Unix platforms are used to handle the AB 373 and 377 sequence data files, largely 
as oescrioed in Xeriavage et a/., Proceedings of the Twenty-Sixth Annual Hawaii 
International Conference on System Sciences^ 585, IEEE Computer Society Press. 
Washington D.C. (1993). Factum (AB) is a Macintosh program designed for 
automatic vector sequence removal and end-trimming of sequence files. The 

25 program Loadis ; J}*ns on^a Macintosh platform and parses the feature data extracted 
from the s^^ Unix based Streptococcus pneumoniae 

relational database. Assembly of contigs (and whole genome sequences) is 
accomplished by retrieving a specific set of sequence files and their associated 
features using Extrseq, a Unix utility for retrieving sequences from an SQL 

30 database. The resulting sequence file is processed by seqj.filter to trim portions of 
the sequences with more than 2% ambiguous nucleotides. The sequence files were 
assembled using TIGR Assembler, an assembly engine designed at The Institute 
for Genomic Research ( TIGR ) for rapid and accurate assembly of thousands of 
sequence fragments. The collection of contigs generated by the assembly step is 

35 loaded into the database with the lassie program. Identification of open reading 
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GTTTCGATTG CAGTTGTTGT TGGAAATTGT GTTTTTTCTA CAACGTTAAA GTTTTCATCA 7620 

CCGACAGCAC AGACAAACTT TGTACCGCCC GCTTCCAACC TTCCATATAA TTTTGTCATG 7680 

ATAAACCTCT TGTTTTTATT TTCTTTATTA TAGCATACTT CGAAAGTCTA AATGTCTCTA 7740 

TTTTTTAGAT TTTCCTCTCT AAATCTTACT ATCTAATAAA AACGAACAAA CATGTCATTT 7800 

GTTCGTTTTC ACATTAGAGA GGATTGATTA GATTTTCACT TCGATCACAG CATCCCCCTT 7860 

AGCAACTGAA CCTGTTGCGA CTGGAGCTAC TGAAGCOTAG TCACCTGTAT TTGTAACGAT 7920 

AACCATTGTT GTATCATCAA GTCCAGCTGC AGCGATTTTG TTTGAGTCAA ATGTTCCAAG 7980 

AACATCGCCA GCTTTCACCT TATTACCTTG AGCAACTTTT GTTTCAAAAC CGTCAOCGTT 8040 

CATAGATACA GTATCAATAC CAACATGAAT CAAAACTTCA GCACCATTTC TTGTTTTCAA 8100 

ACCAAAAGCG TGCCCTGTTG GAAAGGCAAT TGAAACTTCA GCATCAGCTC GTGCATAGAC 8160 

CACGCCTTGG CTTGGTTTCA CAACGATACC TTGTCCCATA GCTCCACTTG AGAAGACTGG B220 

GTCATTGACA TCAGCAAGAG CGACAACATC ACCGACGATA GCAGTTACAA GTGTTTCATT 8280 

TTGAAGAGCT GCTGGCGCAA C ' lTCf TCTTT TTCTTCAGCC ACTTCAGCTC GTTTTQCACC 8340 

TGCAGTTGCG TCTACTTCAT CTTCGTAACC AAACATGTAA GTAAGAGCAA AACCAAGGCC 8400 

AAATGATACA GCTACCATAA GAACCTATTG TGGAAGTTCT CCGTTACCAA CATAAAGCAT 8460 

TGTACCAGGG ATGATG6TGA TACCATTACC AGTACCACCA AGTCCAACCA TACAAGCCAA 8520 

TCCACCACCG ATTGCACCAG CAATCAATGA AAGGAAGAAT CGTTTACGGA AGCGCAAGTT 8580 

CACCCCGAAG ATAGCAGGCT CTGTAATACC TAGGAACCCA GAAACACCAG CCGGGAAAGC 864C 

AA GTG TTTTC AGTTTTGGAT T T TTTl.'r'rTT AACACCAACC GCAACAGTAG CAGCACCTTC 8700 

AGCTGTCATA GCAGCTGTGA .TCATAGCCTT GAATGCGTTA GCATGGTCAG CACCAACTAA 8760 

TTGCACTTCA AGCAAGTTGA AGATGTCGTG CACACCTGAC AGGACCATCA ATTGGTGAAC 8820 

CCCACCAATC AAGAAACCAC CAAGACCAAA TGGCATGCTA AGAATCGCTT TTGTAGCAAT 8880 

AAGGATGTAG TTTTCAACAA CGTGGAAAAC TGGTCCAATG ACAAAGAGTC CAAGGATACA 8940 

CATGACCAAA ACTGTCACGA ATCCTGTTAC CAAGAGGTCA ATGACATCTG GAACAACTTG 9000 

,%-*Ht'< i r, 1 t.t- r ' t £ t '- ■ ■ ' - ' * - ' 

CGGACAGCTT TTTCAXATTT: AGCTCCGACA - ACCCCGATGA TGAAGGCTGG AAGAACGGAA 9060 

CCTTGCAAAC CAACAACAGC GATGAAACCA AAGAACTTCA TCGCTGTTAC TTCACCACCT 9120 

TGAGCAACTG CCCAAGCGTT TGGAAGTGAG CCAGAGACAA GCATCATACC AAGAACGATA 9180 

CCAACGGCAG GATTTCCACC AAATACACGG AAGGTTGACC ACACAACCAA ACCTCGCAAG 9240 

ATCATCAACG CTGTATCTGT CAAGATTTGT GTGTAAGTTG CAAAGTCACC TGGAAGTGGC 93 00 

4 
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What Is Claimed Is: 



1. Computer readable medium having recorded thereon the nucleotide 
sequence depicted in SEQ H>NOS: 1-391, a representative fragment thereof or a 
nucleotide sequence at least 95% identical to a nucleotide sequence depicted in SEQ 
ID NOS: 1-391. 

2. Computer readable medium having recorded thereon any one of the 
fragments of SEQ ID NOS: 1-391 depicted in Tables 2 and 3 or a degenerate variant 
thereof. 

3. The computer readable medium of claim 1, wherein said medium is 
selected from the group consisting of a floppy disc, a hard disc, random access 
memory (RAM), read only memory (ROM), and CD-ROM. 

4. The computer readable medium of claim 3, wherein said medium is 
selected from the group consisting of a "floppy disc, a hard disc, random access 
memory (RAM), read only memory (ROM), and CD-ROM. 

5. A computer-based system for identifying fragments of the Streptococcus 
pneumoniae genome of commercial importance comprising the following elements: 

a) a data storage means comprising the nucleotide sequence of SEQ ID 
NOS: 1-391, a representative fragment thereof, or a nucleotide sequence at least 
95% identical to a nucleotide sequence of SEQ ID NOS: 1-391 ; 

b) search means for comparing a target sequence to the nucleotide sequence 
of the data storage means of step (a) to identify homologous sequence(s), and 

c) retrieval means for obtaining said homologous sequence(s) of step (b). 

6. A method for identifying commercially important nucleic acid fragments 
of the Streptococcus pneumoniae genome comprising the step of comparing a 
database comprising the nucleotide sequences depicted in SEQ ED NOS: 1-391, a 
representative fragment thereof, or a nucleotide sequence at least 95% identical to a 
nucleotide sequence of SEQ ED NOS: 1-391 with a target sequence to obtain a 
nucleic acid molecule comprised of a complementary nucleotide sequence to said~ 
target sequence, wherein said target sequence is not randomly selected. 
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7. A method for identifying an expression modulating fragment of 
Streptococcus pneumoniae genome comprising the step of comparing a database 
comprising the nucleotide sequences depicted in SEQ ID NOS: 1-391, a 
representative fragment thereof, or a nucleotide sequence at least 95% identical to 
the nucleotide sequence of SEQ ID NOS:l-391 with a target sequence to obtain a 
nucleic acid molecule comprised of a complementary nucleotide sequence to said 
target sequence, wherein said target sequence comprises sequences known to 
regulate gene expression. 

8. An isolated protein-encoding nucleic acid fragment of the Streptococcus 
pneumoniae genome, wherein said fragment consists of the nucleotide sequence of 
any one of the fragments of SEQ ID NOS: 1-391 depicted in Tables 2 and 3, or a 
degenerate variant thereof* 

9. A vector comprising any one of the fragments of the Streptococcus 
pneumoniae genome SEQ ID NOS: 1-391 depicted in Tables 2 and 3 or a 
degenerate variant thereof . 

10. An isolated fragment of the Streptococcus pneumoniae genome, 
wherein said fragment modulates the expression of an operably linked open reading 
frame, wherein said fragment consists of the nucleotide sequence from about 10 to 
200 bases in length which is 5* to any one of the open reading frames depicted in 
Tables 2 and 3 or a degenerate variant thereof. 

11. A vector comprising any one of the fragments of the Streptococcus 
pneumoniae genome of claim 8. 

12. An organism which has been altered to contain any one of the 
fragments of the Streptococcus pneumoniae genome of claim 8. 

13. An organism which has been altered to contain any one of the 
fragments of the Streptococcus pneumoniae genome of claim 10. 
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14. A method for regulating the expression of a nucleic acid molecule 
comprising the step of covalenUy attaching to said nucleic acid molecule a nucleic 
acid molecule consisting of the nucleotide sequence from about 10 to 100 bases 5' 
to any one of the fragments of the Streptococcus pneumoniae genome depicted in 
SEQ ID NOS: 1-391 and Tables 2 and 3 or a degenerate variant thereof. 



15. An isolated nucleic acid molecule encoding a hpmolog of any of the 
fragments of the Streptococcus pneumoniae genome of SEQ ID NOS: 1-391 and 

100 Tables 2 and 3, wherein said nucleic acid molecule is produced by a process 

comprising steps of: 

a) screening a genomic DNA library using as a probe a target sequence 
defined by any of SEQ ID NOS: 1-391 and Tables 2 and 3, including fragments 
thereof; 

105 b > identifying members of said library which contain sequences that 

hybridize to said target sequence; and 

c) isolating the nucleic acid molecules from said members identified in step 

(b). 

110 

16. An isolated DNA molecule encoding a homolog of any one of the 
fragments of the Streptococcus pneumoniae genome of SEQ ID NOS: 1-391 and 
Tables 2 and 3, wherein said nucleic acid molecule is produced a process 
comprising steps of: 4 

1 15 a ) isolating mRNA, DNA, or cDNA produced from an organism; 

b) amplifying nucleic acid molecules whose nucleotide sequence is 
homologous to amplification primers derived from said fragment of said 
Streptococcus pneumoniae genome to prime said amplification; 

c) isolating said amplified sequences produced in step (b). 



J7. An isolated polypeptide encoded by any of the fragments of the 
Streptococcus pneumoniae genome of SEQ ID NOS: 1-391 and depicted in Table 2 
and 3 or by a degenerate variant of said fragments. 

18. An isolated polynucleotide molecule encoding any one of the 
polypeptides of claim 17. 
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19, An antibody which selectively binds to any one of the polypeptides of 
claim 17 

130 

20. A method for producing a polypeptide in a host cell comprising the 
steps of: 

a) incubating a host containing a heterologous nucleic acid molecule whose 
nucleotide sequence consists of any one of the fragments of the Streptococcus 

135 pneumoniae genome of SEQ ID NOS: 1-391 and depicted in Tables 2 and 3, under 

conditions where said heterologous nucleic acid molecule is expressed to produce 
said protein, and 

b) isolating said protein. 



